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Abstract — This paper presents a method for jointly designing 
the transmitter-receiver pair in a block-by-block communication 
system that employs (intra-block) decision feedback detection. 
We provide closed-form expressions for transmitter-receiver pairs 
that simultaneously minimize the arithmetic mean squared error 
(MSE) at the decision point (assuming perfect feedback), the 
geometric MSE, and the bit error rate of a uniformly bit- 
loaded system at moderate-to-high signal-to-noise ratios. Sep- 
arate expressions apply for the "zero-forcing" and "minimum 
MSE" (MMSE) decision feedback structures. In the MMSE 
case, the proposed design also maximizes the Gaussian mutual 
information and suggests that one can approach the capacity 
of the block transmission system using (independent instances 
of) the same (Gaussian) code for each element of the block. 
Our simulation studies indicate that the proposed transceivers 
perform significantly better than standard transceivers, and that 
they retain their performance advantages in the presence of error 
propagation. 

Index Terms — block precoding; decision feedback detection; 
zero-forcing; minimum mean-square error; bit error rate; mutual 
information; channel capacity. 



I. Introduction 

Block-by-block communication is an effective scheme for 
the transmission of data over dispersive media; e.g., [28]- 
[30], [41], [42]. In such "vector" communication schemes, 
blocks of data are transmitted in a manner that avoids inter- 
ference between the received blocks, and hence the detector 
need only operate on a block-by-block basis. Two popular 
examples of block-by-block communication schemes are or- 
thogonal frequency division multiplexing (OFDM) [5] and 
discrete multi-tone modulation (DMT) [8]. In addition, certain 
multiple antenna systems operate in a block-by-block fashion 
(e.g., [18], [20], [26], [36], [43], [45]), and block-by-block 
detection schemes appear in some multiuser detectors for 
synchronous CDMA systems [14], [15], [47]. In general, an 
optimal detector for a block transmission system must make 
a decision on the received data block as a whole, although 
in certain cases, such as OFDM and DMT, the elements of 
that block can be decoupled and simpler detection schemes 
obtained. Unfortunately, maximum likelihood detection of the 
transmitted vector can be rather computationally expensive, 
and simpler detectors based on linear equalization and (dis- 
joint) symbol-by-symbol detection may incur a significant 
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performance loss. A useful compromise between performance 
and complexity can be obtained by employing intra-block 
decision feedback detection [4], [10], [14], [15], [18], [20], 
[22], [28], [44], [47], [51]. In an intra-block decision feedback 
detector the individual symbols which constitute a given block 
are detected sequentially, with the "intra-block interference" 
from previously detected symbols being subtracted before 
the decision on the current symbol is made. Such schemes 
fall into the class of generalized decision feedback equaliz- 
ers [10]. In multiple antenna communication schemes intra- 
block decision feedback is sometimes referred to as "nulling 
and cancelling" [4], [18], [20], and in multi-user detection the 
corresponding concept is sometimes referred to as "successive 
interference cancellation" [14], [15], [22], [47]. 

The goal of the present paper is to jointly design the linear 
transmitter matrix and the receiver feedforward and feedback 
matrices so as to optimize the performance of a block-by-block 
communication system with an intra-block decision feedback 
detector (BDFD). The design is based on knowledge of the 
channel, and hence is an appropriate choice for systems in 
which there is timely, reliable feedback from the receiver to 
the transmitter. The proposed approach provides closed-form 
expressions for transceivers that minimize the arithmetic mean 
(over the block) of the expected squared errors (MSE) at the 
input to the (scalar) decision device that is implicit in the 
BDFD, under the standard assumption [3], [9], [10], [17], [40], 
[52] that the previous decisions were correct. The expressions 
depend on the nature of the BDFD, and separate expressions 
are provided for the zero-forcing (ZF) and minimum mean 
square error (MMSE) BDFDs. In order to help distinguish 
our designs from previous work, we point out that if one is 
given a transmitter matrix, the design of the feedforward and 
feedback matrices of a ZF or MMSE-BDFD that minimize the 
MSE is well known; e.g., [2], [4], [9], [10], [17], [20], [40]. 
However, the joint minimum MSE design of the transmitter 
and receiver matrices has previously been deemed to be 
difficult (e.g., [52, p. 1338]), and hence several authors have 
suggested minimizing a particular lower bound on the MSE, 
namely the geometric mean of the expected squared errors; 
e.g., [9], [10], [52]. We will minimize the geometric MSE 
as the first step in our approach, but we will also show how 
the unitary matrix that parameterizes the set of transceivers 
which minimize the geometric MSE can be chosen so that the 
(arithmetic) MSE attains its minimized lower bound. 

Transceivers designed in the manner we propose have 
several additional desirable properties. In particular, the inputs 
to the (scalar) decision device are uncorrelated and have equal 
signal-to-interference-and-noise ratios (SINRs). In fact, the 
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minimum SINR over the elements of the block is maximized. 
As a result, the average bit error rate (BER) is (essentially) 
minimized. More precisely, for systems with a ZF-BDFD our 
design minimizes the average BER for (uncoded) uniform 
QPSK signalling at moderate-to-high signal-to-noise ratios 
(SNRs), and also minimizes the dominant components of the 
BER for uniform .M-ary QAM signalling. 1 For systems with 
an MMSE-BDFD, our design minimizes the average BER 
under an assumption that the residual intra-block interference 
is Gaussian. 

For the MMSE-BDFD, it is reasonably well known [9], [10], 
[17], [40], [52] that any transmitter that minimizes the geo- 
metric MSE (including the proposed design) also maximizes 
the mutual information between the transmitter and receiver 
for Gaussian signals. However, the standard choice from the 
set of transmitters that minimize the geometric MSE does not 
minimize the (arithmetic) MSE and produces inputs to the 
decision device that have potentially different SINRs for each 
element of the block. Therefore, in order to achieve reliable 
communication at rates which approach the capacity of the 
block transmission system, different codes (and constellations) 
may need to be applied for each element of the block [10]. An 
advantage of the proposed design is that from within the set of 
transmitters that minimize the geometric MSE (and maximize 
the Gaussian mutual information), we obtain a transceiver that 
also minimizes the arithmetic MSE, minimizes the BER, and 
provides uncorrected inputs to the decision device that have 
identical (and maximized) SINRs. Since the MMSE-BDFD is 
a "canonical" receiver [9], [10], [23], this suggests that by 
using the proposed design, reliable communication at rates 
approaching the capacity of the block transmission system 
can be achieved by using independent instances of the same 
(Gaussian) code in each element of block. 

As mentioned earlier, our designs are based on the standard 
assumption [3], [9], [10], [17], [40], [52] that the previous 
symbols were correctly detected. However, error propaga- 
tion is not catastrophic in block-by-block communication 
schemes because errors can only propagate within a single 
block (e.g., [10] and Section|II}. Bounds for the conventional 
symbol-by-symbol decision feedback equalizer (DFE) [1], [16] 
also suggest that good performance should be maintained in 
the presence of error propagation, and our simulations confirm 
this prediction. Furthermore, our simulation studies indicate 
that the proposed transceivers perform significantly better than 
standard transceivers, and that they retain their performance 
advantages in the presence of error propagation. 

Notation: The notation adopted in this paper is fairly 
standard. We conform to the following conventions: scalars 
are denoted by lower case letters; vectors by bold lower 
case letters; and matrices by bold upper case letters. The 
symbol Ijy denotes the identity matrix of size N, and Onxm 
denotes the N X M matrix of zeros. The symbol |A| denotes 
the determinant of a matrix A, and tr(A) denotes its trace. 
The symbol E[-] denotes the expectation operator; (-) H the 
complex-conjugate transpose operation; ( ) T the transpose 

'Our design for the ZF-BDFD coincides with the one that minimizes the 
block error rate [56], [57], but the design approach taken in the present paper 
is substantially different from that taken in [56], [57]. 



operation; and denotes the element at the intersection of 
the ith row and jth column of a matrix. 

II. Block-by-block Transmission 

We consider the generic block-by-block transmission system 
with intra-block decision feedback detection illustrated in 
Fig-H In this system, a block of M data symbols, s, is linearly 
precoded to construct a block of K > M channel symbols, 
u = Fs, which is transmitted over the channel. The receiver 
independently processes a block of P > M received samples 
in order to detect the data vector s. The received block, y, can 
be written as 

y = HFs + v, (1) 

where the PxK matrix H captures the effects of the channel, 
and v is a length P vector of additive noise samples. We 
will assume that the noise is circularly symmetric [37] (or, 
proper [35]) and Gaussian, with zero mean and positive defi- 
nite correlation matrix E[vv H ] = R vv . We will also assume 
that the data symbols have zero mean and are white, 2 of unit 
energy, and not correlated with the noise, (i.e., i?[ss H ] = I 
and E[sv H ] = 0). The model in Q is applicable in many 
applications, including zero-padded or cyclic-prefixed block 
transmission over a scalar finite impulse response channel that 
is constant over the duration of the block; e.g., [6], [12], [28]- 
[30], [41], [42], [44]. In the zero-padded case H is a tall, lower 
triangular, full column rank Toeplitz matrix whose columns 
contain the impulse response of the channel, and in the cyclic- 
prefixed case H is a square circulant matrix whose columns 
contain the channel impulse response. The model in Q is also 
applicable in: vector transmission over a narrowband multiple 
antenna channel (e.g., [18], [20]), in which case H has no 
deterministic structure; in space-time block transmission over a 
(quasi-static) narrowband multiple antenna channel (e.g., [26], 
[45]), in which case H has a block diagonal structure; and 
in block transmission over a (quasi-static) frequency-selective 
multiple antenna channel (e.g., [36], [43]), in which case H 
is either block Toeplitz or block circulant. 

The intra-block decision feedback detector first pre- 
processes the received block y with an M x P feedforward 
matrix W to form z = Wy. (The functional form of W 
depends on whether the ZF- or MMSE-BDFD is implemented; 
see Section [HI]) The detection of the transmitted symbols 
s m — [s] m then proceeds sequentially, starting from m — M, 
by making a scalar decision on sm = zm and then s m = z m — 
s m , m = M-l, M-2, . . . , 1, where s m = Y,et m +i h ™fsi is 
the output of the feedback filter, with b, n e being its coefficients. 
The states of that filter, §t, are the previously detected symbols 
in the block and the filter coefficients are different for each 
element of the block (indexed by m). Once a given block has 
been detected, the states of the feedback filter are reset to zero. 
That is, the symbols are detected on a block-by-block basis 
and hence error propagation between blocks is avoided. 

2 In the case where £?[ss H ] is not a scaled identity matrix, a data whitening 
matrix can readily be absorbed into the precoder, so long as the data covariance 
matrix is known (and full rank). 
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Fig. 1, A generic block-by-block communication system with intra-block decision feedback detection. The P/S block denotes parallel-to-serial conversion 
with the last element of the input block becoming the first output, and the S/P block denotes serial-to-parallel conversion with the first input becoming the 
last element of the output block. 
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Fig. 2. A convenient conceptual model for Fig. 

If the filter coefficients b m g are arranged in a strictly upper 
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the operation of the block transceiver in Fig. \l\ is equivalent 
to successively making decisions on the elements of 



s = WHFs + Wv Bs, 



(2) 



starting from the Mth row. That interpretation leads to the 
convenient conceptual model in Fig. [2] We observe that when 
B = 0, the system in Fig. |2 reduces to a block transmission 
system with linear equalization and disjoint detection; e.g., [6], 
[12], [36], [41]-[43]. In fact, many of the results for the linear 
case can be obtained by setting B = in the expressions we 
will derive herein. 

If we denote the error between the input to the detector and 
the transmitted data symbols by e = s — s, then 

e = (WHF - I)s Bs + Wv. (3) 

Under the assumption of correct past decisions (i.e., when 
deciding s m , sg = Sf, for all m + 1 < I < M), e simplifies to 



e = (WHF I - B)s + Wv. 



(4) 



The covariance of this error will play a key role in our designs. 
Under our statistical models for s and v, the covariance matrix 
of the error is 



R ee = E[ee H ] = (WHF B I) (WHF B I) H 



WR„„W 



H 



(5) 



The (arithmetic) MSE of the detector input is simply e 2 = 

tr(E[(s - s)(s - s) H )]) /M = tr(R ee )/M. 



III. Minimum MSE Transceivers 

In this section, our goal is to jointly design the transceiver 
elements F, B, and W so that the (arithmetic) MSE is 
minimized, subject to a bound, po, on the average transmitted 
power, and constraints which ensure that the receiver performs 
either ZF or MMSE decision-feedback detection. The average 
transmitted power is given by i?[tr(Fs(Fs)- ff )] = tr(FF H ), 
and hence the design problem can be stated as 

min tr((WHF B F)(WHF B 

F,B,W V 

+WR TO W ff ) (6a) 

subject to tr(FF H ) < p , and (6b) 

a functional relationship between F, B and W. 

(6c) 

The functional relationship between F, B and W determines 
whether the BDFD is of the ZF type or the MMSE type. 
This optimization problem is rather difficult to solve directly 
because it is not convex, and hence is subject to the standard 
difficulties associated with the potential for multiple local 
minima. However, we will use the following stages to find 
a solution (F,B, W) whose performance is optimal: 

1) Obtain a (tight) lower bound on the MSE, and minimize 
that lower bound, subject to the constraint on transmis- 
sion power. 

2) Derive a triple (F,B,W) whose performance achieves 
the minimized lower bound. 

In the following subsections, we will perform the above stages 
to obtain the minimized lower bounds on the MSE and optimal 
transceivers for the ZF and MMSE BDFDs, respectively. 

The matrix H R^H will play a key role in our designs. 
For later convenience we let 

VAV ff = H H R^H (7) 

represent the eigenvalue decomposition of H^R^H, with 
eigenvalues Xi arranged in non-increasing order along the 
diagonal of A. For an integer 1 < k < K, we also define 
V/j to be the first k columns of V and A& to be the upper left 
k x k block of A. In the development of our designs, we will 
find it convenient to parameterize the K x M precoder matrix 
F of rank q in terms of its singular value decomposition, 



F = [* 



qx(M-q), 



(8) 
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where ® contains q columns of a K x K unitary matrix, $ is 
a diagonal positive definite qx q matrix, and $ is an M x M 
unitary matrix. 

A. Zero-forcing BDFD 

The zero-forcing criterion imposes the following relation- 
ship between W, F and B (see (0J): 



WHF = B + I. 



(9) 



Given a P x K matrix H and an integer M < min{P, K}, 
there exists a K x M matrix F, an M x P matrix W and 
an M x M strictly upper triangular matrix B such that (|9} is 
satisfied if and only if rank(H) > M, and we will make the 
assumption that this condition holds. 3 In order to satisfy J9j, 
F must be chosen so that it has rank M and that rank(HF) = 
M. 

By substituting (|9} into and (|5}, the covariance matrix 
of the error can be written as 



R, 



= WIl,„W 



H 



(10) 



If we define W = WR^ 2 , then the design problem can 
be re-written as 



min tr(WW H ) 

W,B,F 

subject to tr(FF ff ) < p , 
WHF = B + I, 



(Ha) 

(lib) 
(11c) 



where H = R^J^H. From il \\ it is clear that for a given F 
for which there exists a solution to (II lcl and a given B, the 
optimal W is W = (B + I)(HF)+, where (•)+ denotes the 
(minimum-norm) Moore-Penrose pseudo-inverse. Therefore, 
the optimal receiver feedforward matrix can be written as 



W ZF = (B + IXHF^R^ 1 / 2 . 



(12) 



Since HF has at least as many rows as it has columns and 
has full column rank, 



(HF)+ = (F fl H fl HT)- 1 F H H fl 



(13) 



If we let U = B + I, the design problem in (II It has been 
reduced to 



min tr(u(HF)+((HF)+) H U if 
subject to tr(FF ff ) < p Q , 



(14a) 
(14b) 

U being a unit-diagonal upper-triangular matrix. 

(14c) 

The first stage in our solution of dl4> is to derive and 
minimize a lower bound on the objective function (I14ai . 
The lower bound that we will use is a simple consequence 
of the arithmetic-geometric mean inequality [27, p. 535]. In 
particular, for an M x M positive semidefinite matrix X, 



tr(X)/M > IXI 1 ^ 1 , 



(15) 



3 If M were a design variable, rather than a parameter of the problem, one 
could guarantee that this condition holds by simply choosing M < rank(H). 



with equality holding if and only if X = al for some a > 0. 
For convenience, we will refer to H5\ as the trace-determinant 
inequality. 

Applying dl5> to J14ai . a lower bound on the mean-square 
error is 

ej F = tr(WzpR OT W£)/M > |u(HF)+((HF)+) ff U H | 1/M 

(16a) 

= IF^H^HFI- 1 ^, (16b) 

where we have used the fact that U is a unit-diagonal upper- 
triangular matrix and thus |U| = 1, and the expression 
for (HF)+ in PI . Observe that (I16b> depends only on 
the transmitter F and is independent of U = B + I. It 
is also of interest to point out that the bound in (I16a> is 
equivalent to stating that the arithmetic MSE is bounded below 
by the geometric MSE; i.e. tr(R e e jZF )/M > |R ee , ZF | 1/M - 
Therefore, the problem of minimizing the lower bound in d!6ai 
corresponds to minimizing the geometric MSE. 

The lower bound in (I16> can be minimized simply by 
maximizing iF^H^HFj; i.e., by solving 



max 

F 



F^H^R^HFI 



subject to tr(FF ff ) < p . 



(17a) 
(17b) 



Using the ordered eigen-decomposition of H^R^H in {7}, 
and applying the trace-determinant inequality (1151 . we have 
that 



IF^H^R^HFI 



|®H R OT H® 



< 



< 



tr(* 2 ) 



M 



M M 



M M 



(18a) 
(18b) 

(18c) 



Therefore, for any ZF-BDFD system, the (arithmetic) MSE is 
bounded below by 



M 
Po 



M 

IIA, 



-l/M 



(19) 



This bound depends only on the parameters M and po, and 
the M largest eigenvalues of H^R^H. 

The second stage of the derivation of the proposed design 
is to determine matrices F and B so that the minimized lower 
bound on the arithmetic MSE in dl9l l is achieved, To do so, 
we point out that according to the trace-determinant inequality 
dl 51 and the eigenvalue decomposition of H^R^H in 0, 
the bound in dl8b> holds with equality if and only if $ — al 
for some a > and © = Va/P, where Vj; was defined after 
and P is an arbitrary permutation matrix. According to the 
power constraint in dl7bl i. the bound in dl 8cl > is achieved if 
and only if a — ^/po/M. Therefore, precoders of the form 
F = ^po/M Vji;$, where * is an arbitrary M x M unitary 
matrix, minimize the geometric MSE of a ZF-BDFD system. 
The remaining task is to determine matrices such that the 
bound in dl6al holds with equality. To do so, we observe that 
the trace-determinant inequality dl5i holds with equality if and 
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only if X = al for some a > 0. Therefore, dl6at holds with 
equality if and only if we can choose \& such that R ee! zF = 
(Tgl, where a\ = (M/po) (Ylf^x At) M - That is, we can 
achieve the minimized lower bound on the arithmetic MSE if 
and only if we can find a \f such that 

-U^A^U^^I, (20) 
Po 

where Am was defined after (0. By taking the Cholesky 
factor, solving ( I20> is equivalent to solving 

U* H A M 1/2 = a e Q H , (21) 

where Q is an M x M unitary matrix. That is, we can reduce 
the search for a pair (F,B) such that the minimized lower 
bound on the MSE is achieved to the search for a unit-diagonal 
upper-triangular matrix U, and unitary matrices \& and Q that 
satisfy J2 lb . Substituting er e into i2l\ . we get: 

A^ 2 * = QU, (22) 

where U = (Ilf=i Aj) U. The following result, which 

is a special case of a more general result in [56], [57], indicates 
that a solution to Mil exists. 

Lemma 1: Let T be a diagonal non-singular MxM matrix. 
There exists a unitary matrix S such that TS has an equal- 
diagonal 'R-factor' in its (standard) QR decomposition; i.e. 3S 
such that TS = QR, where Q is an M x M unitary matrix and 
R is an upper-triangular matrix with equal diagonal elements 
{ M \ 1/M 

[B]a = ( IIfc=i Ik J for i = 1, 2, • • • , M, where -f k is the 
kth diagonal element of T. □ 
The matrix S in Lemma ^ can be obtained by suitably 
modifying Algorithm 5 in [57]. The modified algorithm is 
provided in Appendix|I] Using that algorithm, we can obtain SI/ 

. . -1/2 

in J22b . By performing the QR decomposition of A M SP, we 
obtain an upper triangular matrix U whose diagonal elements 
are all equal to (Ilf=i A;) Finally, we obtain U using 

U = (Ilf=i Aj) 1 ^ 2M -'tj. Thus, we have established the 
following proposition: 

Proposition 1: The (arithmetic) mean-square error 
tr(R ee )/M of a block-by-block transceiver with a 
ZF-BDFD achieves its minimized lower bound of 
(M/po)(nf=i Ai) when the precoder F = 

Vjvi-*zF> where \f ZF is obtained by applying the 
n ~ V 2 

algorithm in Appendix |I| to A M . The corresponding 
feedback matrix B = U I, where U is the unit-diagonal 
upper-tnangular matrix U = Ql i=1 A$ 1 U, and U is 

obtained from the QR decomposition in 122\ . Substituting 
such F and B into d!2t yields the feedforward matrix W. □ 
From the above derivation it is apparent that the precoder 
in Proposition n which minimizes the arithmetic MSE, also 
minimizes the geometric MSE. However, a precoder that 
minimizes the geometric MSE does not necessarily minimize 
the arithmetic MSE. 



B. MMSE-BDFD 

In this subsection, we consider joint transmitter-receiver 
design for a system based on the MMSE-BDFD. The approach 
is similar to that for the ZF-BDFD in the previous subsection, 
but the details are substantially different. 

Recall from Section ITT1 and Fig. [2] that the received vector 
is y = HFs + v. Hence, the error between s and s is 
e = Wy — (B + I)s. The covariance matrix of y is Rj^ = 
(HF)(HF) ff + R vv , and cross-correlation matrix of s and y 
is R sy = (HF) H = Hy S . In order to determine the minimum 
MSE feedforward matrix, W MMSE , we exploit the standard 
first-order necessary condition for optimality known as the 
orthogonality principle [39], namely E[ey H ] = WR M -(B+ 
I)R S j/ = 0. Therefore, 

W MMSE = (B+rjR^R" 1 . (23) 

Substituting (I23t into (|5}, and invoking the Matrix Inver- 
sion Lemma (A + CB^D)- 1 = A" 1 - A- 1 C(B + 
DA _1 C)~ 1 DA _1 , [32], the covariance matrix of the error 
can be written as 

Ree,MMSE = (B+I) (I + F^HX^HF)" 1 (B+I) h . (24) 

Our goal is to design the F and B to minimize the MSE 
subject to the power constraint. Letting U = B + I, the design 
problem (|6j can be rewritten as 

min tr(^U(l + F H H ff R^HF) _1 U H ^) (25a) 

subject to tr(FF H ) < p , and (25b) 

U being a unit-diagonal upper-triangular matrix. 

(25c) 

Following the first stage outlined at the beginning of Sec- 
tion |TjT] we now obtain and minimize a lower bound on the 
MSE. According to the trace-determinant inequality d!5> , we 
have that 

tr (u (I + F H H H R^HF) ^ U H ) 

> Af|U(l + F ff H ff R-iHF) _1 U H | 1/M 

= M|l + F ff H H R- 1 HFp 1/M . (26) 

Therefore, the lower bound on the MSE can be minimized by 
solving: 

max II + F^H^R^HFI (27a) 

F 

subject to tr(FF H ) < p Q . (27b) 

As in the ZF case, the problem of minimizing the lower bound 
depends only on the transmitter. We point out that the objective 
in ( 127b ) is equivalent to minimizing the geometric MSE 
implicit in (I26> . Furthermore, the logarithm of the objective 
in (I27h ) is the mutual information between the transmitter and 
receiver for Gaussian signals. (An analogous observation has 
been made in several similar contexts [9], [10], [17], [40], 
[52].) Hence, minimizing the lower bound on the arithmetic 
MSE in d26i is equivalent to maximizing the Gaussian mutual 
information. 
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Given that the problem in ( I27i is equivalent to maximizing 
the mutual information for Gaussian signals, the solution in- 
volves a "waterfilling" power allocation over the eigenvectors 
of H^R^jH, [50]. More formally, the solution depends on 
a parameter r < K which is the largest integer satisfying 
1/A r < (po + 2~2]=i -V )/ r - ^ we define q = min{r, M}, 
then the following set of precoders 4 minimize the lower 
bound [50], F = V q [* gX (M-g)] where $ is a g x 5 
diagonal matrix with diagonal elements satisfying 

1 q 

\4>u\ 2 = -(po+^2xf)-K\ (28) 
q j=i 

and \I/ is an arbitrary A/ x M unitary matrix. 5 In that case, 
the minimal value of the lower bound on the MSE generated 
by d26j and (JIT) is 

g MMSE > <Z 9/M (ft, + t V 1 ) fl A 7 VM ' (29) 
i=i j'=i 

which is independent of our design parameters F and B. 

Moving to the second stage of our general approach, we 
now determine a transceiver that achieves the minimized 
lower bound in ( I29> . For ease of exposition, we define = 
[* o, x(M _,) ]. Substituting F = V^** into (ED and ( I25a> , 
the arithmetic MSE is tr(R ef . iMMSE )/M, where 

R ee ,MMs E = U* H (I M + ^A,*)- 1 *!^. (30) 

Using the trace-determinant inequality (I15> . for the MSE 
to achieve its minimized lower bound, we must choose U 
and * so that R ee , M MSE = where 0% — q q ^ M (po + 

E?=i >q l T q/M ni=i X J 1/M - That is < a s y stem of the form 

in ( 12 8 1 achieves the minimized lower bound on the MSE in 
( I30i if and only if we can find U = (l/<7 e )U and unitary 
matrices \I> and Q so that 

(I M + * T A 9 *) 1/2 * = QU. (31) 

According to Lemma ^ there exists a unitary matrix \I/ 
such that the QR decomposition of (Im + $ A 9 $) 1//2 \I/ 
has an upper triangular "R-factor" with diagonal elements all 
equal to |(I M + * A^*) 1 / 2 *! 1 ^ 2 *^. This unitary matrix 
can be obtained by applying the algorithm in Appendix [I] to 
(I M + * A,*) 1 / 2 . We summarize this result in the following 
proposition. 

Proposition 2: The mean-square error tr(R ee )/M for a 
block-by-block transceiver with an MMSE-BDFD achieves 
its minimized lower bound \29\ when the precoder F = 
V g [* °gx(A/-,) ] *mmse, where * satisfies ||28), and \& M mse 
is obtained by applying the algorithm in Appendix H] to (Im + 

$ A^^) 1 / 2 . The corresponding feedback matrix B = U I, 
where U is the unit-diagonal upper-triangular matrix U = 
<r e U and U is obtained from the QR decomposition in ( I31> . 

If M = K and r = K, or if X q > A 9 +i, this set is the set of all 
precoders that minimize the lower bound. 

5 The rank of the resulting product HF is q, and hence if M were a design 
variable rather than a parameter of the problem, a natural choice for M would 
be M = r. 



Substituting such F and B into fl23i yields the feedforward 
matrix W. □ 

As was the case for the ZF-BDFD in Section III1-AI the 
precoder in Proposition |2] which minimizes the arithmetic 
MSE, lies within the set of precoders that minimize the 
geometric MSE, but a precoder chosen arbitrarily from the 
set of precoders that minimize the geometric MSE does not 
necessarily minimize the arithmetic MSE. This observation 
provides a connection between the proposed design and an 
earlier design for a more general overlapping block transmis- 
sion system in which the transmitter was designed to minimize 
the geometric MSE [52]. In the context of the block-by-block 
transmission schemes that we have considered, the design 
in [52] corresponds to choosing \& = Im, rather than choice 
of SI/ = SPmmse in Proposition |2 While the choice of \& = Im 
results in a system that minimizes the geometric MSE, it 
does not minimize the arithmetic MSE in the general case. 
In addition, the SINR for each element of the block may be 
different. In contrast, the choice of \& = SPmmse minimizes 
the geometric MSE and the arithmetic MSE, and provides an 
equal SINR for each element of the block. 

The choice of SP also has an impact on the nature of coding 
strategies for approaching the capacity of the block-by-block 
transmission system. From the discussion following (1271 it is 
evident that the Gaussian mutual information is maximized 
by choosing M = r and employing a transmitter matrix of 
the form F = V r #\l', where $ satisfies d28l and * is an 
arbitrary r x r unitary matrix. Since the MMSE-BDFD is 
a "canonical" receiver 6 for Gaussian signals [9], [10], [23], 
this suggests that by using sufficiently powerful codes, reliable 
communication at rates approaching the capacity of the block 
transmission system can be achieved by employing any F of 
this form and the MMSE-BDFD [9], [10], [23]. The choice 
*f? = I r results in a "vector coding" scheme [10], [29], [30], 
[36], [41] in which the feedback component of the MMSE- 
BDFD is inactive; i.e., B = 0. Vector coding induces an 
equivalent system with r parallel Gaussian subchannels, each 
with a possibly different SNR pi. (Standard discrete multitone 
(DMT) modulation schemes [5], [8] are a class of vector 
coding schemes.) Therefore, one can approach the capacity 
of the block transmission scheme by choosing the code for 
the ith element of the block to be one that approximates the 
ideal Gaussian code of rate bi = log 2 (l + pi) bits per channel 
use. (Such approximations will often involve the selection of 
a constellation for each element of the block.) The choice 
*f? = S^mmse results in a system in which the feedback 
component of the MMSE-BDFD is active, and the inputs 
to the decision device are uncorrelated and have identical 
SINRs p. Since the MMSE-BDFD is a canonical receiver, this 
suggests that one can also approach the capacity of the block 
transmission system by employing an independent instance 
of the same approximation of the ideal Gaussian code of 
rate b — log 2 (l + p) for each element of the block. The 
MMSE-BDFD used when * = * 

mmse is more complicated 

6 The term "canonical" is used to denote the fact that in the absence of error 
propagation, employing an MMSE-BDFD in place of the optimal detector 
does not reduce the achievable data rate [9], [10]. Methods for exploiting this 
property of the MMSE-BDFD were described in [24], [48]. 
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to implement than the linear detector of the vector coding 
scheme because of the need to compute the feedback signal. 
However, the vector coding approach requires the design (and 
implementation) of (up to) r codes, one for each element of 
the block, whereas the proposed design requires the design of 
only one code. 

IV. Bit Error Rate Performance 

In this section, we show that the (F, B) pairs designed in 
Section HJl] to minimize the arithmetic MSE also minimize the 
(dominant components of the uncoded) bit error rate (BER) 
of a block transmission system with uniform bit loading at 
moderate-to-high block SNRs. We define the average BER of 
the detected signal to be the average of the probability of error 
of each element of the block; i.e., 



1 



Pe = — 



M 



(32) 



where P e ^ denotes the BER of the ith symbol Sj. For ease 
of exposition, we will deal with the ZF and MMSE-BDFDs 
separately. We will begin with the case of the ZF-BDFD. 

A. ZF-BDFD 

For the ZF-BDFD 7 and for square 8 QAM signalling with 
2bi bits per symbol, if all the previous decisions are correct 
P e .i is closely approximated 9 by [7] 



Pi e,z ~ Pe. 



rfc (y/ % pi, ZF ) + Ci erfc (3 v / ApZ^) , 



(33) 



where erfc(x) = (2/^/tt) J°° e z dz is the error function 
complement, pi_ ZP is the decision point SNR for the ith symbol 



in the block, a. 
Hence, 



b i V4 b 



S Pi 



f=TJ> and Ci 



p e =—yp e 



Under the assumption that all the previous symbols were 
correctly detected, we have that 



Pi, ZF 



(34) 



E[\Si - j 

and under our assumptions that E^ss^] = I and E[sv H ] = 0, 
this expression simplifies to 

Pi,ZF = rp , ■ (35) 

[-tvee,ZFjii 

Therefore, the average BER can be closely approximated by 

M 



P e ~Pe = ^J2ai erfc U f3 t /[R ee , ZF ] 



e,ZF]i 



(36) 



7 We implicitly assume that rank(H) > M so that the ZF-BDFD exists. 

8 For notational simplicity we have restricted our attention to square QAM 
constellations. The extension to rectangular QAM constellations can be 
derived in a straightforward manner using the BER expressions in [7], [53]. 

In the case of QPSK signalling, the expression in l33l . in which Q = 0, 
is exact. 



Since our precoders generate equal decision point SNRs for 
each element of the block, we will assume uniform bit-loading 
in the remainder of this section, and therefore we will drop the 
element index, i, in a,, (3i and Q. When [R eeiZF ]ii < 2/3/3, 
which corresponds to moderate-to-high SNRs, P e is a convex 
function of [R ee ]«, [12], [13], [36]. By applying Jensen's 
inequality [11] to d36l >. we obtain the following lower bound 
on the average BER 



P e > aerfc(^ i SA//tr(Ree,zF) 

+ Cerfc(30?M/tr(R ee ,zF)). (37) 

Equality in J37i holds if and only if the diagonal elements of 
R ee ^ ZF are equal. 

Equation d37t exposes an intriguing relationship between 
the (arithmetic) MSE and the BER. Since minimizing 
tr(R ee ,zF) simultaneously minimizes both terms in the sum- 
mation on the right hand side of (1371 . minimizing the lower 
bound on P e in ( I37t is equivalent to minimizing the MSE; 
i.e., it is equivalent to minimizing tr(R ee Z F)- Therefore, the 
lower bound on P e achieves its minimum value if the MSE is 
minimal. However, for the actual P e to achieve its lower bound 
(i.e., for (I37i to hold with equality), the diagonal elements of 
R eei zF must be identical. 10 Fortunately, the design proposed in 
Proposition ^ results in R ee .zF = and hence the proposed 
design, which minimizes the (arithmetic) MSE of a ZF-BDFD, 
also minimizes the BER of the ZF-BDFD at moderate-to-high 
SNRs, in the sense that it minimizes P e in (02). 

B. MMSE-BDFD 

The analysis of the previous section can be extended to 
the case of the MMSE-BDFD if the residual intra-block 
interference on each element of the block is approximated 
by a Gaussian random variable. For large block sizes, this 
approximation is (almost surely) sufficiently accurate for all 
but the last few elements of the block (c.f., [25], [38], [54]), 
and hence it is appropriate for our analysis. In order to account 
for the bias in the MMSE-BDFD (e.g., [9]), we can express 
the BER as a function of the decision point SINR of the ith 
element of the block [9], [10], [36], 

1 

PiMMSB — 77; i !• (38) 



IR 



ee,MMSE I n 



(Note that < [R mmse]m < 1-) By replacing pi;tf in (13 3i by 
Pi mmse s the BER of the MMSE-BDFE can be approximated 
by 



u 



P e « P e = — ^ a i erfc [\fPi{{ [R. ee ,MMSE ] a ) 
i=l 



Cierfc(3J/?i(([R ee ,MMSE]M) 1 - l" 



(39) 



As was the case for the ZF-BDFD, this function is convex in 
[R ee! MMs E ]ii when [R ee>M MSE]« is below a (reasonably large) 
threshold [6], [36], and hence for a system in which uniform 

10 The alternative analysis in [47] generates a related observation. 
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bit loading is applied, Jensen's inequality can be used to show 
that 

P e > aerfc(^/ / 8(M/tr(R ee ,MMSE) - l)) 

+ Cerfc(3y / /3(M/tr(R ee , MMSE ) - l)), (40) 

with equality holding when the diagonal elements of R ee ,MMSE 
are equal. Hence, using similar arguments to those used in the 
case of the ZF-BDFD, the design proposed in Proposition [2] 
which minimizes the arithmetic MSE of the MMSE-BDFD 
and results in R ee ,MMSE = a ^ so minimizes the BER of the 
MMSE-BDFD at moderate-to-high SNRs, in the sense that it 
minimizes P e in J39i . 1 1 

V. Performance Analysis 

In Section IIVI it was shown that the precoders that we 
designed in Section ITTT1 (essentially) minimize the BER of the 
BDFD, under the assumption that the decisions that are fed 
back in the receiver are correct. It can also be shown (see 
Appendix |llj that under the same assumption the optimized 
system for an MMSE-BDFD provides a lower BER than the 
optimized system for a ZF-BDFD, and that each optimized 
BDFD system provides a lower BER than the optimized 
system for the corresponding linear detector; c.f., [6], [12], 
[36]. That said, an incorrect decision in a BDFD can make it 
more likely that subsequent errors will occur by feeding back 
incorrect decisions. This may lead to error propagation across 
the block. (Recall that error propagation between blocks is 
explicitly avoided in block-by-block communication systems.) 
A standard bound on the probability of error of a conven- 
tional decision feedback equalizer in the presence of error 
propagation is a simple multiple of the probability of error 
in the absence of error propagation [16]. This suggests that 
the systems designed in Section [HI] should perform well in 
the presence of error propagation. (A bound that is sometimes 
tighter [1] generates similar insight.) In this section, we 
seek to verify these suggestions by analyzing, via simulation, 
the (uncoded) BER performance of the system when error 
propagation may occur. 

We will consider two communication scenarios: zero- 
padded block transmission [41], [42], [44] through a (quasi- 
static) scalar finite impulse response (FIR) frequency-selective 
fading channel that is constant over the length of the block; 
and transmission through a narrowband (i.e., frequency-flat) 
multiple antenna fading channel with at least as many receive 
antennas as transmit antennas [18]. In the first scenario, 
the channel matrix H is a tall, lower triangular, Toeplitz 
matrix, but in the second scenario H does not possess any 
deterministic structure. We will evaluate the average BER 
performance of various transceivers for these channels in the 
presence of additive white Gaussian noise at the receiver; i.e., 
R„„ = <r 2 I. We will plot the BER performance curves as 
a function of the (system) SNR, which we define as being 

"Note th at if M > r, then rank(F) < M and hence the lower bound on 
the BER in 1401 will be quite high. If M were a design variable, rather than 
a parameter of the problem, reducing the symbol rate to M = r would result 
in a substantial reduction in the error rate of the optimized system. 



the ratio of the transmitted energy per symbol to the noise 
variance; i.e., (po/M)/a 2 . 

In addition to the transceivers we designed for the ZF-BDFD 
and MMSE-BDFD in Section [Hi] for which the precoders are 
denoted by F opt -zf-bdfd and F 0P t-mmse-bdfd, respectively, when 
M = K we will also consider the direct transmission scheme, 
for which the precoder is 

F, = y/po/MI M , (41) 

and the discrete Fourier transform (DFT) precoded scheme, 
for which the precoder is 

F DFT = v/;p /MD H , (42) 

where D is the normalized M x M DFT matrix. For the 
precoders in J41I and (1421 . the receiver matrices B and W are 
chosen according to the (separate) design procedures for the 
ZF-BDFD and MMSE-BDFD in [44]. (Note that the precoders 
in the direct and DFT schemes are channel independent.) For 
all these precoders, we provide BER curves for the idealized 
detector, in which the decisions that are fed back are correct, 
and for the practical detector, in which the actual decisions 
are fed back (and hence error propagation may occur). 

In order to assess the extent of the performance gains 
(derived in Appendix HH of the optimized BDFD systems over 
the optimized system for the corresponding linear detector, 
we will include the performance of systems with linear ZF 
and MMSE detection and precoders designed so that the 
BER at moderate-to-high block SNRs is minimized [6], [12], 
[36]. Using the notational conventions in Sections ITT1 and ITTT1 
in particular the ordered eigen decomposition H^R^H = 
VAV ff , a minimum BER precoder for the linear ZF detector 
is [12] 

Fopt-zf-l = Vpo/tr(A;/ /2 ) V M A M 1/4 D, (43) 
and one for the linear MMSE detector is [6], [36] 

FoPT-MMSE-L = Vfc [T Okx(M-k)] D, (44) 

where the integer k = min{£, M}, where t is the largest 
integer such that 

a 7 1/2 (Ea; 1/2 )-EV<^ 

and T is a k x k diagonal matrix with diagonal elements 
satisfying 

, |2 /ft + E^\ 1/2 ! 
V 1^3 = 1 X J 7 

A. Scalar frequency-selective fading channel 

In this section we consider the case of zero-padded block 
transmission through a (quasi-static) scalar FIR frequency- 
selective fading channel. In this case, the direct transmission 
scheme in fl41i is sometimes referred to as the "single-carrier 
zero-padded" (SCZP) scheme [49], and the DFT precoded 
scheme is sometimes called the "zero-padded OFDM" (ZP- 
OFDM) scheme [34]. We consider a scenario in which the 
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channel is of length L + 1 = 5 and L zeros are appended to 
each block of channel symbols u. The symbol block s is of 
length M = 16, and we consider square precoders F. (Hence, 
K = 16 and P = K + L = 20.) Each element of s is an 
independently selected symbol from the 4-QAM constellation, 
with each constellation point being equally likely. In Fig.[3]we 
plot the BER for the ZF-BDFD transceivers, averaged over ten 
thousand channel realizations. (In the optimized designs, the 
transceiver was re-designed for each channel realization.) For 
each channel realization the tap coefficients were generated 
independently from a zero-mean circular complex Gaussian 
distribution and then normalized so that the impulse response 
had unit energy. It is clear from the solid curves in Fig. [5] that 
in the absence of error propagation, the design proposed in 
Proposition \l\ performs better than all the other transmission 
schemes, 12 although the SNR gain over the direct transmission 
(SCZP) scheme is rather small (around 0.5 dB at a BER 
of 10 -4 ). Furthermore, the dashed curves demonstrate that 
this performance advantage is maintained in the presence 
of error propagation. In particular, the performance of the 
proposed scheme in the presence of error propagation is 
as good as the performance of the SCZP scheme in the 
absence of error propagation. The combination of the DFT 
transmitter (ZP-OFDM) and the ZF-BDFD performs poorly 
at moderate-to-high block SNRs. In fact, it is apparent from 
Fig. |3 that the linear ZF detection scheme with its minimum 
BER precoder [12] performs better than the combination of 
the DFT transmitter and the ZF-BDFD. However, as predicted 
by the analysis in Appendix ITT1 the optimal precoder for the 
ZF-BDFD provides substantially better performance than the 
combination of the linear ZF detector and its minimum BER 
precoder. 

The corresponding results for the MMSE-BDFD are pro- 
vided in Fig. |4] The same trends are observed and the SNR 
gains are at least as large. Furthermore, the improved BER 
performance of the optimized MMSE-BDFD system over 
the optimized ZF-BDFD system predicted by the analysis in 
Appendix [H] can be clearly observed. In both Figs [3] and |4] 
the performance of the optimized scheme in the absence of 
error propagation is indistinguishable from the corresponding 
bound on P e in Section ffVl c.f., (I37> and J40i . respectively. 

An interesting by-product of the above performance eval- 
uation is the good performance provided by the (channel 
independent) direct transmission scheme (SCZP). In fact, the 
SCZP scheme is an optimal channel independent transmission 
scheme for systems that employ linear [31] or maximum 
likelihood [49], [55] detection, and it approaches the diversity- 
multiplexing trade-off for a standard class of FIR channels as 
the block length grows [21]. These desirable characteristics 
are due, in part, to the fact that the SCZP scheme preserves 
the good conditioning properties implicit in the tall lower- 
triangular Toeplitz structure of the channel matrix. 



12 As predicted by the derivation in Section IIV-AI the proposed precoder 
performs better than all other transmission schemes for each realization of the 
channel. 
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SNR, dB 

Fig. 3. Average BER performance of the ZF-BDFD for the various precoders 
and the linear ZF detector with its optimal pr ecode r in the scalar frequency- 
selective fading channel scenario in Section IV-AI The solid curves denote 
performance achieved in the absence of error propagation, and the dashed 
curves incorporate the effects of error propagation. Legend — *: optimized 
scheme, F opt -zf-bdfd I o: direct (SCZP), Pi; X: DFT (ZP-OFDM), F DFT ; o: 
optimized linear ZF scheme, Fopt-zf-l- 

B. Multiple antenna systems 

In this example, we consider the case of narrowband trans- 
mission over a multiple antenna channel with at least as many 
receiver antennas as transmitter antennas. In this scenario, the 
combination of the direct transmission scheme and a BDFD is 
sometimes referred to as (uncoded) V-BLAST with a (fixed- 
order) "nulling and cancelling" receiver [4], [18], [20]. We 
consider a standard Rayleigh model for the channel in which 
the paths between antennas are modelled as independent zero- 
mean circular Gaussian random variables of unit variance. 

We will focus on scenarios with K = 3 transmitter antennas 
and P — 3 or 4 receiver antennas in which M = K = 
3 symbols are transmitted per channel use. Each element 
of s is an independent and equally-likely 4-QAM symbol. 
Therefore, the bit rate of each scheme is 6 bits-per-channel- 
use (bpcu). In Figs [5] and [6] we plot the average BER 
performance over ten thousand channel realizations of the 
various transmission schemes with the ZF receivers, and in 
Figs and [8] we plot the corresponding curves for the MMSE 
receivers. While most of the basic trends from the case of 
the scalar frequency-selective channels are maintained in the 
multiple antenna scenario, the performance advantages of the 
precoders designed in Section |lll| are much greater. (The 
SNR gains are of the order of 6-8 dB at a BER of 10 -4 .) 
This can be attributed to the fact that the channel matrix H 
does not possess any deterministic structure. In particular, the 
probability of encountering a channel matrix that does not 
have M substantial singular values is not negligible. Since 
the proposed designs provide significantly better performance 
in those cases, the average performance is also substantially 
improved. 

As expected, the performance of the optimized ZF-BDFD 
scheme in the absence of error propagation in Figs [5] and [6] 
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Fig. 4. Average BER performance of the MMSE-BDFD for the various 
precoders and the linear MMSE detector with its optimal preco der in the 
scalar frequency-selective fading channel scenario in Section IV- Al The solid 
curves denote performance achieved in the absence of error propagation, and 
the dashed curves incorporate the effects of error propagation. Legend — 
A: optimized scheme, Fopt-mmse-bdfd ; °: direct (SCZP), Pi; x: DFT (ZP- 
OFDM), F DFT ; □: optimized linear MMSE scheme, F pt-mmse-l- 



is equal to the lower bound on P e in (I37> . (Recall that we 
are using 4-QAM signalling.) However, in the MMSE-BDFD 
case, the lower bound on P e in J40i is distinguishable from 
the simulated BER in the absence of error propagation. This is 
due to the fact that the block size (M — 3) is small enough for 
the inaccuracy of the Gaussian approximation of the residual 
interference to result in a discernible difference between the 
BER and P e . That said, even for this small block size, P e is 
an accurate approximation of the BER in the absence of error 
propagation. 

A few other features of Figs [5JJ8] are worthy of note. First, 
the average performance of the direct and DFT transmission 
schemes are essentially the same. This is to be expected 
because the statistics of H are unitarily invariant. Second, the 
increase in the diversity provided by the channel when using 
P = 4 receiver antennas rather than P = 3 is clear from 
the different slopes of the BER curves at high SNR. Finally, 
the performance advantage of the optimized MMSE-BDFD 
scheme over the optimized ZF-BDFD scheme is significant in 
the case of P = 4 receiver antennas and is substantial in the 
case of P = 3. The performance advantage of the optimized 
MMSE-BDFD scheme is due, in part, to the fact the power 
allocated to the first M eigenmodes of H^R^jH depends on 
the corresponding eigenvalues. In particular, weak eigenmodes 
might not be allocated any power at all. In contrast, the 
optimized ZF-BDFD scheme allocates power uniformly over 
these eigenmodes. The larger performance advantage of the 
optimized MMSE-BDFD scheme in the case of P — 3 is due 
to the larger probability of encountering a channel matrix such 
that H^R^H does not have M = 3 significant eigenvalues. 

For reference, we have included the performance of a stan- 
dard orthogonal space-time block coding (OSTBC) scheme in 
FigsEHEl (Like the direct and DFT transmission schemes, OS- 



TBC schemes were designed to be applied without knowledge 
of the channel at the transmitter.) We have used the (symbol) 
rate 3/4 code in [19] (which is a simplified version of that 
in [45]), and hence in order to achieve a bit rate of 6 bpcu, a 
natural choice for the underlying constellation is 256-QAM. 
(We assume that the channel is constant for the four channel 
uses that are required to transmit the codewords.) As expected, 
at high SNR, the OSTBC scheme provides better BER per- 
formance than that direct transmission (V-BLAST) scheme. 
However, the proposed precoder (which exploits knowledge 
of the channel) provides substantially better performance when 
P = 4 receiver antennas are employed, and when P = 3 and 
the MMSE-BDFD receiver is used. 

When P = 3 receiver antennas are employed and the ZF- 
BDFD is used, the OSTBC scheme performs better than the 
optimized scheme at high SNRs. This does not contradict the 
optimality of the proposed transceiver design, because the 
values of M, K and P, and the structure of the channel 
matrix, are different for the OSTBC scheme. 13 The good 
performance of the OSTBC scheme at high SNRs is simply a 
manifestation of the trade-off between error rate (achievable 
diversity) and symbol rate in multiple antenna fading channels 
without outer codes [46]. (That trade-off is related to the fun- 
damental diversity-multiplexing trade-off [58].) The symbol 
rate of the OSTBC scheme is significantly lower than that 
of the proposed scheme. 14 Hence, in the range of SNRs in 
which noise dominates the error performance, the proposed 
scheme provides better performance than the OSTBC scheme, 
but in the SNR range in which the channel condition dominates 
the error performance, the OSTBC scheme provides better 
performance. To illustrate that point, in Fig. [5] we plotted 
with unmarked curves the performance of the proposed ZF- 
BDFD scheme with a symbol rate of M = 2 (as distinct 
from the scheme with M — 3 described above). In order 
to maintain a bit rate of 6 bpcu, the elements of s were 
taken, in an independent and equally-likely fashion, from an 8- 
QAM constellations, and for consistency, the SNR was defined 
to be (po/3)/a 2 . Over the range of SNRs considered, the 
performance of the proposed ZF-BDFD scheme with M = 2 
is substantially better than that of the OSTBC scheme, with 
SNR gains of over 7 dB. 

VI. Conclusion 

In this paper, we have jointly designed the precoder and 
the feedback matrix of a block-by-block transmission scheme 
equipped with a zero-forcing or minimum mean-square error 
(MMSE) intra-block decision feedback detector (BDFD). The 
designs minimize the arithmetic mean of the expected squared 
errors at the decision point, under the standard assumption that 
the previous symbols were correctly detected. The covariance 
matrix of the minimized error is white, and hence the proposed 

13 In this example, the channel matrix for the OSTBC scheme is I4 igi H, 
where ® denotes the Kronecker product and H is the channel matrix for the 
other schemes. The corresponding block sizes are P = 12, K = 12, and 
M = 3. 

14 In particular, in 4 consecutive channel uses, the proposed scheme 
transmits AM = 12 symbols, whereas the OSTBC scheme transmits only 
3 symbols. 
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Fig. 5. Average BER performance of the ZF-BDFD for the various precoders 
and the linear ZF detector with its optim al precoder in the narrowband 
multiple antenna scenario in Section IV-BI with 3 transmitter antennas, 3 
receiver antennas and M = 3 symbols per block. The solid curves denote 
performance achieved in the absence of error propagation, and the dashed 
curves incorporate the effects of error propagation. Legend — *: optimized 
scheme, Fopt-zf-bdfd ; o: direct, Fi; x: DFT, F DFT ; o: optimized linear ZF 
scheme, F pt-zf-l; V: OSTBC. For later reference, the unmarked curves are 
for the optimized scheme with M = 2. 

designs also minimize the (dominant components of the) bit 
error rate of a uniformly bit-loaded transmission system. In 
our simulations, the proposed systems performed significantly 
better than standard precoding systems, and retained their 
performance advantages in the presence of error propaga- 
tion. In the case of the MMSE-BDFD, the proposed design 
also maximizes the Gaussian mutual information. Since the 
MMSE-BDFD is a "canonical" receiver [9], [10], [23], this 
suggests that by using the proposed transceiver design, one 
can approach the capacity of the block transmission system 
using (independent instances of) the same (Gaussian) code for 
each element of the block. 

Appendix I 
Algorithm for Lemma[T] 

To state the algorithm succinctly, we make the following 
definitions: g — (JltLi 7*1) > [S]-fc denotes the fcth column 
of S and sik denotes its elements; denotes the first k 
columns of S and Zjjr denotes its orthogonal complement; 
T>a — I — A(A ff A)~ 1 A if . The recursion will be based on 
the (M - k) X (M - k) matrix 

A< fe ) = (rz£) H V {rzi) TZ£. (45) 

For convenience, we assume that the elements of T are 
arranged in non-increasing order. The algorithm proceeds as 
follows: 

1) Initialization: Set k = 1. An explicit solution for the 
first column of S is sn = W-rrr, 

y Tl Tlf 

SMi = \/#^, s n = for I = 2, 3, • • • , M - 1. 

V 7i 1m 




SNR, dB 



Fig. 6. Average BER performance of the ZF-BDFD for the various precoders 
and the linear ZF detector wi th its optimal precoder in the narrowband multiple 
antenna scenario in Section fV-BI with 3 transmitter antennas and 4 receiver 
antennas. The legend is the same as that in Figure l5l 

2) Construct A( fc ) in ( 145 \ and its eigen decomposition, 
AW=vWA (l:) (vW) fl . 

3) Set the (k + l)th column of S to be 

[S]. fe+1 = Z^V«y«, where y<*> = J j^frf , 

(k) I \\ k) -g (k) „ f 

V A l ~ A M-k 

£ = 2,3, ■■■ ,M-k-l. 

4) Increment k. If k < M — 2 return to 2. Otherwise, set 

[S]. M = ZM- 2 v(M ~ 2) y (M-1 ) . where 

(M-l) _ _ i g -xr-v 

9l — \l A (M-2)_ A (M-2) > 

(M-l) _ / A< M - 2) - g 

Vl — Y A (M-2)_ A (M-2) ■ 

Appendix II 
Analytic Performance Comparisons 

It was shown in Section I1VI that the precoders designed 
in Section achieve the minimized value of the lower 
bound on P e ; c.f., d37l and J40i . Therefore, the relative BER 
performance of the optimized ZF-BDFD and MMSE-BDFD 
systems in the absence of error propagation can be determined 
by simply comparing the optimal values of the MSE, e 2 = 
tr(R ee )/M. (A preliminary version of this appendix appeared 
in [33], and related results on the MSEs of conventional 
decision feedback equalizers appear in [2, Chapter 8].) 

In order to ensure that the ZF systems exist, we will assume 
that rank(H) > M, and to simplify the comparisons, we will 
also assume that the transmitted power po is large enough that 
q = M in (J28J for the MMSE-BDFD and I = M in (03 
for the linear MMSE detector. 15 Proposition ^ states that the 

15 The assumption that rank(H) > M ensures that there is a threshold 
value for po above which q = M and i = M. 
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Fig. 7. Average BER performance of the MMSE-BDFD for the various 
precoders and the linear MMSE detector wit h its o ptimal precoder in the nar- 
rowband multiple antenna scenario in Section lV-Bl with 3 transmitter antennas 
and 3 receiver antennas. The solid curves denote performance achieved in the 
absence of error propagation, and the dashed curves incorporate the effects of 
error propagation. Legend — A: optimized scheme, F pt-mmse-bdfd ; o: direct, 
F,; x: DFT, FdftS ^: optimized linear ZF scheme, Fq PT - mmse-l V: OSTBC. 
The dotted curve denotes the lower bound on P e in 1401 . 



Fig. 8. Average BER performance of the MMSE-BDFD for the various 
precoders and the linear MMSE detector with its optim al precoder in the 
narrowband multiple antenna scenario in Section IV-BI with 3 transmitter 
antennas, 4 receiver antennas. The legend is the same as that in Figure 171 



minimum MSE of a system with a linear MMSE detector 
is [6], [36] 



minimum value of the MSE for a ZF-BDFD system is 



-TJPT-MMSE-L 



'OPT- ZF-BDFD 



-iA M r i/M 

Po 



and Proposition |2] states that the minimum value of the MSE 
for an MMSE-BDFD system is 



M 



-OPT-MMSE-BDFD 1 

Po +tr(A M ) 



-l/M 



OPT-MMSE-BDFD 



< e 



OPT-ZF-BDFD ' 



Since Aju is positive definite, e 
and hence, in the absence of error propagation, the optimized 
MMSE-BDFD system will provide a lower BER than the 
optimized ZF-BDFD system. While it is intuitively obvious 
that for a given precoder, the MMSE-BDFD will provide 
a lower MSE than the ZF-BDFD, in the case of optimized 
precoders, this lower MSE leads directly to a lower BER. 

The analysis of Section IIVI remains valid for systems 
with linear detectors, so long as the constraint B = is 
enforced. Therefore, we can compare the BER performance 
of an optimized BDFD system with that of the system that 
is optimized for the corresponding linear detector by simply 
comparing their minimum MSEs. The minimum MSE of a 
system with a linear ZF detector is [12] 



1 



> 



Mp 

— \t 

Po 



(tr(A M 1/2 )) : 



-l/M 



-"OPT-ZF-BDED ' 



where we have used the trace-determinant inequality dl 51 . 
Therefore, in the absence of error propagation the optimized 
system for the ZF-BDFD will provide a lower BER than the 
optimized system for the linear ZF detector. Similarly, the 



M(po + tr(AM))-(tr(A M 1/2 )) 2 



(tr(A M 1/2 )) 2 , 



> 



M 



Po + tr(A M ) 



-l/M 



-OPT-MMSE-BDFD 1 



and hence the optimized system for the MMSE-BDFD pro- 
vides a lower BER than the optimized system for the linear 
MMSE detector. As observed in [6], eo PT . MMSE . L < e„p r _ ZF _ L , 
and hence the optimized system for the linear MMSE detector 
provides a lower BER than the optimized system for the linear 
ZF detector. 
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